Text Embeddings by Weakly-Supervised Contrastive Pre-training

#E5 の論文

Abstract

The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs).

Figure 1

Appendix A

2 Related Works

Most closely related to our work is a series of community efforts by sentence-transformers to train embeddings with a collection of labeled and automatically collected datasets.

In this paper, we show that it is possible to train high-quality embeddings using self-supervised pre-training only.

「自己教師あり事前訓練のみで高品質なembeddingを訓練できることをこの論文で示す」